Goto

Collaborating Authors

 Watertown


Fact-checking AI-generated news reports: Can LLMs catch their own lies?

arXiv.org Artificial Intelligence

In this paper, we evaluate the ability of Large Language Models (LLMs) to assess the veracity of claims in ''news reports'' generated by themselves or other LLMs. Our goal is to determine whether LLMs can effectively fact-check their own content, using methods similar to those used to verify claims made by humans. Our findings indicate that LLMs are more effective at assessing claims in national or international news stories than in local news stories, better at evaluating static information than dynamic information, and better at verifying true claims compared to false ones. We hypothesize that this disparity arises because the former types of claims are better represented in the training data. Additionally, we find that incorporating retrieved results from a search engine in a Retrieval-Augmented Generation (RAG) setting significantly reduces the number of claims an LLM cannot assess. However, this approach also increases the occurrence of incorrect assessments, partly due to irrelevant or low-quality search results. This diagnostic study highlights the need for future research on fact-checking machine-generated reports to prioritize improving the precision and relevance of retrieved information to better support fact-checking efforts. Furthermore, claims about dynamic events and local news may require human-in-the-loop fact-checking systems to ensure accuracy and reliability.


Kernels of Selfhood: GPT-4o shows humanlike patterns of cognitive consistency moderated by free choice

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have surprised the scientific community and even their creators by exhibiting emergent abilities once thought to be uniquely human, such as advanced cognition and reasoning (1-6), although the full extent of these accomplishments is debated (3, 7-10). These capabilities align with the rational and deliberative aspects of human nature, but humans are not purely rational creatures, and it is unclear whether LLMs will mimic a broader spectrum of human psychological tendencies. Here we test whether OpenAI's GPT-4o replicates behaviors associated with the human tendency toward cognitive consistency as well as human sensitivity to choice, characterized by greater attitude shifts when the behaviors inducing these changes are freely chosen. Decades of research demonstrate that humans will irrationally twist their attitudes to align with behaviors they were induced to perform. For example, consider an individual who opposes single-payer healthcare, but volunteers, in response to a request for help, to craft an argument in favor of the policy. Rationally, this individual's attitude toward single-payer healthcare should not move in a more supportive direction; they should be able to discriminate between their genuine attitude and the opposing one that they have articulated only to be helpful.


Biomechanical Comparison of Human Walking Locomotion on Solid Ground and Sand

arXiv.org Artificial Intelligence

Current studies on human locomotion focus mainly on solid ground walking conditions. In this paper, we present a biomechanic comparison of human walking locomotion on solid ground and sand. A novel dataset containing 3-dimensional motion and biomechanical data from 20 able-bodied adults for locomotion on solid ground and sand is collected. We present the data collection methods and report the sensor data along with the kinematic and kinetic profiles of joint biomechanics. A comprehensive analysis of human gait and joint stiffness profiles is presented. The kinematic and kinetic analysis reveals that human walking locomotion on sand shows different ground reaction forces and joint torque profiles, compared with those patterns from walking on solid ground. These gait differences reflect that humans adopt motion control strategies for yielding terrain conditions such as sand. The dataset also provides a source of locomotion data for researchers to study human activity recognition and assistive devices for walking on different terrains.


ChatGPT as Research Scientist: Probing GPT's Capabilities as a Research Librarian, Research Ethicist, Data Generator and Data Predictor

arXiv.org Artificial Intelligence

How good a research scientist is ChatGPT? We systematically probed the capabilities of GPT-3.5 and GPT-4 across four central components of the scientific process: as a Research Librarian, Research Ethicist, Data Generator, and Novel Data Predictor, using psychological science as a testing field. In Study 1 (Research Librarian), unlike human researchers, GPT-3.5 and GPT-4 hallucinated, authoritatively generating fictional references 36.0% and 5.4% of the time, respectively, although GPT-4 exhibited an evolving capacity to acknowledge its fictions. In Study 2 (Research Ethicist), GPT-4 (though not GPT-3.5) proved capable of detecting violations like p-hacking in fictional research protocols, correcting 88.6% of blatantly presented issues, and 72.6% of subtly presented issues. In Study 3 (Data Generator), both models consistently replicated patterns of cultural bias previously discovered in large language corpora, indicating that ChatGPT can simulate known results, an antecedent to usefulness for both data generation and skills like hypothesis generation. Contrastingly, in Study 4 (Novel Data Predictor), neither model was successful at predicting new results absent in their training data, and neither appeared to leverage substantially new information when predicting more versus less novel outcomes. Together, these results suggest that GPT is a flawed but rapidly improving librarian, a decent research ethicist already, capable of data generation in simple domains with known characteristics but poor at predicting novel patterns of empirical data to aid future experimentation.


Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference

arXiv.org Artificial Intelligence

On-device machine learning (ML) moves computation from the cloud to personal devices, protecting user privacy and enabling intelligent user experiences. However, fitting models on devices with limited resources presents a major technical challenge: practitioners need to optimize models and balance hardware metrics such as model size, latency, and power. To help practitioners create efficient ML models, we designed and developed Talaria: a model visualization and optimization system. Talaria enables practitioners to compile models to hardware, interactively visualize model statistics, and simulate optimizations to test the impact on inference metrics. Since its internal deployment two years ago, we have evaluated Talaria using three methodologies: (1) a log analysis highlighting its growth of 800+ practitioners submitting 3,600+ models; (2) a usability survey with 26 users assessing the utility of 20 Talaria features; and (3) a qualitative interview with the 7 most active users about their experience using Talaria.


CEO: Corpus-based Open-Domain Event Ontology Induction

arXiv.org Artificial Intelligence

Existing event-centric NLP models often only apply to the pre-defined ontology, which significantly restricts their generalization capabilities. This paper presents CEO, a novel Corpus-based Event Ontology induction model to relax the restriction imposed by pre-defined event ontologies. Without direct supervision, CEO leverages distant supervision from available summary datasets to detect corpus-wise salient events and exploits external event knowledge to force events within a short distance to have close embeddings. Experiments on three popular event datasets show that the schema induced by CEO has better coverage and higher accuracy than previous methods. Moreover, CEO is the first event ontology induction model that can induce a hierarchical event ontology with meaningful names on eleven open-domain corpora, making the induced schema more trustworthy and easier to be further curated.


Subtyping patients with chronic disease using longitudinal BMI patterns

arXiv.org Artificial Intelligence

Obesity is a major health problem, increasing the risk of various major chronic diseases, such as diabetes, cancer, and stroke. While the role of obesity identified by cross-sectional BMI recordings has been heavily studied, the role of BMI trajectories is much less explored. In this study, we use a machine-learning approach to subtype individuals' risk of developing 18 major chronic diseases by using their BMI trajectories extracted from a large and geographically diverse EHR dataset capturing the health status of around two million individuals for a period of six years. We define nine new interpretable and evidence-based variables based on the BMI trajectories to cluster the patients into subgroups using the k-means clustering method. We thoroughly review each cluster's characteristics in terms of demographic, socioeconomic, and physiological measurement variables to specify the distinct properties of the patients in the clusters. In our experiments, the direct relationship of obesity with diabetes, hypertension, Alzheimer's, and dementia has been re-established and distinct clusters with specific characteristics for several of the chronic diseases have been found to be conforming or complementary to the existing body of knowledge.


Forecasting labels under distribution-shift for machine-guided sequence design

arXiv.org Artificial Intelligence

The ability to design and optimize biological sequences with specific functionalities would unlock enormous value in technology and healthcare. In recent years, machine learning-guided sequence design has progressed this goal significantly, though validating designed sequences in the lab or clinic takes many months and substantial labor. It is therefore valuable to assess the likelihood that a designed set contains sequences of the desired quality (which often lies outside the label distribution in our training data) before committing resources to an experiment. Forecasting, a prominent concept in many domains where feedback can be delayed (e.g. elections), has not been used or studied in the context of sequence design. Here we propose a method to guide decision-making that forecasts the performance of high-throughput libraries (e.g. containing $10^5$ unique variants) based on estimates provided by models, providing a posterior for the distribution of labels in the library. We show that our method outperforms baselines that naively use model scores to estimate library performance, which are the only tool available today for this purpose.


Learning the shape of protein micro-environments with a holographic convolutional neural network

arXiv.org Artificial Intelligence

Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from structure remains a major challenge. Here, we introduce Holographic Convolutional Neural Network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein function, including stability and binding of protein complexes. Our interpretable computational model for protein structure-function maps could guide design of novel proteins with desired function.


Edge.org

#artificialintelligence

The conversation is on hold. The Edge community has hit the road... or they're staying home. Preparing for the academic year to begin, wrapping up projects and starting new ones, celebrating with family and friends or contemplating in solitude. After a hiatus, Edge is pleased to revive Summer Postcards: Edgies reporting in from wherever they are and on whatever they're doing, as the dog days wind out and the season comes to a close. As the world slowly returns to a "new normal" with enduring COVID restrictions in the midst of renewed vaccine freedoms, this year's collection is a testament to change (temporary and lasting), a consideration of loss (will travel ever be like it was?), and a celebration of questions (that still need answering). The hammock may be away until next year, but the memories remain. I spent the summer writing and revising the final section of a longish novel I started in 2019. It seems now as though I've been from 1946 to 2021 on my hands and knees. Various lockdowns have been a liberation from obligations and the luggage carousel, and I've never known such sweet and total focus for months on end. We have the luxury of living in the country--no shortage of big skies and moody walks. All our few breaks were in the UK--Scotland, the Lake District, the West country. Even in our remote part of the Lakes, I had to keep on writing--as in photo. The best novel I read this summer was Sandro Veronesi's The Hummingbird. Best non-fiction was Peter Godfrey Smith's Metazoa: Animal Life and the Birth of the Mind. I gave time also to some wonderful novellas--perfect fictional form for you too-busy scientists. IAN MCEWAN is a novelist whose works have earned him worldwide critical acclaim. He is the recipient of the Man Booker Prize for Amsterdam (1998), the National Book Critics' Circle Fiction Award, and the Los Angeles Times Prize for Fiction for Atonement (2003). His most recent novel is Machines Like Me. In 2019, Časlav Brukner and myself were walking on a beach on Lamma Island, near Hong Kong, marvelling together at the astonishing strangeness of quantum phenomena. This summer, the conversation with Časlav has continued on another island, and quite an island: Lesbos, the northern Greek island near the Turkish coast. Lesbos is the place where lyrical poetry was born. Here lived Sappho and Alcaeus.